Skip to content

Conversation

@gregw
Copy link
Contributor

@gregw gregw commented Oct 21, 2025

Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.

Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.
@gregw
Copy link
Contributor Author

gregw commented Oct 21, 2025

The issue is fundamentally that completeStream is being called twice.

For this to happen, we need a failure to be detected in the ChannelCallback.succeeded() method so that the following code isrun:

if (failure != null)
{
httpChannelState._callbackFailure = failure;
if (!stream.isCommitted())
errorResponse = new ErrorResponse(request);
else
completeStream = true;
}

This means that completeStream will be called, even though the other "legs of the 3 legged stool" are not complete - specifically we may still be inside the call to HandlerInvoker.run(), as in this stack for thread 258 :

2025-10-20 08:14:42,457 INFO  [WebServerImpl-258] trace.jetty.session.complete: complete() called on session [ManagedSession@4df349fb{id=MYSECRETSESSIONID,x=MYSECRETSESSIONID.node0,req=6,res=true}]
java.lang.Exception: complete stack
	...
	at org.eclipse.jetty.session.AbstractSessionManager.complete(AbstractSessionManager.java)
	at org.eclipse.jetty.session.AbstractSessionManager$SessionStreamWrapper.doComplete(AbstractSessionManager.java:1509)
	at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1518)
	at org.eclipse.jetty.session.AbstractSessionManager$SessionStreamWrapper.failed(AbstractSessionManager.java:1479)
	at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.completeStream(HttpChannelState.java:788)
	at org.eclipse.jetty.server.internal.HttpChannelState$ChannelCallback.succeeded(HttpChannelState.java:1591)
	at org.eclipse.jetty.server.handler.gzip.GzipResponseAndCallback.succeeded(GzipResponseAndCallback.java:95)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.onCompleted(ServletChannel.java:765)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:429)
	at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:470)
	at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:717)
	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1071)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:138)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:611)
	at org.eclipse.jetty.server.Handler$Sequence.handle(Handler.java:805)
	at org.eclipse.jetty.server.Server.handle(Server.java:182)
	at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:677)
	at org.eclipse.jetty.util.thread.Invocable$ReadyTask.run(Invocable.java:177)
	at org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2$1.run(HttpStreamOverHTTP2.java:144)
	...

We can see that there is a failure detected in ChannelCallback.succeeded() because SessionStreamWrapper.failed is ultimately called. This means that there must have been one of the following application errors:

These are all plausible application errors, especially with something like server sent events.

So once thread 258 has called completeStream it returns all the way out of handling and it can be seen calling completeStream again, in this stack trace:

java.lang.Exception: complete stack
	...
	at org.eclipse.jetty.session.AbstractSessionManager.complete(AbstractSessionManager.java)
	at org.eclipse.jetty.session.AbstractSessionManager$SessionStreamWrapper.doComplete(AbstractSessionManager.java:1509)
	at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1524)
	at org.eclipse.jetty.session.AbstractSessionManager$SessionStreamWrapper.failed(AbstractSessionManager.java:1479)
	at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.completeStream(HttpChannelState.java:788)
	at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:712)
	at org.eclipse.jetty.util.thread.Invocable$ReadyTask.run(Invocable.java:177)
	at org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2$1.run(HttpStreamOverHTTP2.java:144)
	...

which is called from this code after the handler has been invoked:

try (AutoLock ignored = _lock.lock())
{
stream = _stream;
_handling = null;
_handled = true;
failure = _callbackFailure;
callbackCompleted = _callbackCompleted;
lastStreamSendComplete = lockedIsLastStreamSendCompleted();
completeStream = callbackCompleted && lastStreamSendComplete;
if (LOG.isDebugEnabled())
LOG.debug("handler invoked: completeStream={} failure={} callbackCompleted={} {}", completeStream, failure, callbackCompleted, HttpChannelState.this);
}
if (LOG.isDebugEnabled())
LOG.debug("stream={}, failure={}, callbackCompleted={}, completeStream={}", stream, failure, callbackCompleted, completeStream);
if (completeStream)
{
if (LOG.isDebugEnabled())
LOG.debug("completeStream({}, {})", stream, Objects.toString(failure));
completeStream(stream, failure);
}

Note that in order for this code to actually call completeStream then it must be true that completeStream = callbackCompleted && lastStreamSendComplete. Note that this is normally not the case for HTTP/1 because the first call to completeStream would have recycled the HttpChannelState and the callbackCompleted and lastStreamSendComplete will both be false. However, for H2, HttpChannelStates are re-used after being recycled, so another request may have come in and set the fields of the state again, so that the second call to completeStream incorrectly completes that new request.

Thus I believe the core fix is to not call completeStream whilst we are still handling. Furthermore, if we are to ignore the last write leg of the stool, we should explicitly force lastStreamSendComplete to true;

Unfortunately I have been unable to produce a unit test for this, as I believe it needs precisely unlucky timing and an application error.

Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.
@gregw
Copy link
Contributor Author

gregw commented Oct 21, 2025

@sbordet @lorban Can you review the diagnosis that @janbartel and I have come up with. I'm 90% sure this is it, but I cannot reproduce (any thoughts how we might be able to do that?).

@gregw
Copy link
Contributor Author

gregw commented Oct 21, 2025

Note that we added this completeStream call in #9684

Copy link
Contributor

@sbordet sbordet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChannelCallback.failed() seems not entirely correct either.

Can we write test cases for this scenario?

Comment on lines 1583 to 1584
// We are committed and still handling, so let the HandlerInvoker complete, ignoring any pending reads/writes.
httpChannelState._streamSendState = StreamSendState.LAST_COMPLETE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure we should ignore pending writes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or pending reads.... it is a difficult one. I'll at least change the code to enumerate the possible states

@gregw
Copy link
Contributor Author

gregw commented Oct 21, 2025

ChannelCallback.failed() seems not entirely correct either.

@sbordet I will look....

Can we write test cases for this scenario?

Very hard, because unless there is another thread racing the second completeStream is not called. I'm open to suggestions.

gregw added 3 commits October 22, 2025 11:02
Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.

refactor success and failure into single method.
Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.

refactor success and failure into single method.

Improved EventSourceServlet
Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.

refactor success and failure into single method.

Improved EventSourceServlet
@gregw
Copy link
Contributor Author

gregw commented Oct 22, 2025

@sbordet @lorban I'm getting concerned at the number of tests this PR is breaking in its current state.
I think we need to take time to consider the more "cleanup" changes and probably only make this in 12.1.x

So I propose that this PR should simply be 83c1718 for 12.0.x (perhaps with the EventSourceServlet cleanups), and then we can do a wider cleanup and refactor in 12.1.x next month.

@gregw gregw requested a review from sbordet October 22, 2025 19:50
gregw added 3 commits October 23, 2025 07:33
Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.

refactor success and failure into single method.

Improved EventSourceServlet
Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.

refactor success and failure into single method.

Improved EventSourceServlet
Fix #13470 by calling completeStream only once even when there is a failure in the channel callback.

refactor success and failure into single method.

Improved EventSourceServlet
@lorban
Copy link
Contributor

lorban commented Oct 23, 2025

Regarding the minimal 12.0 fix, I think httpChannelState._handling == null should actually be httpChannelState._handled

@gregw
Copy link
Contributor Author

gregw commented Oct 24, 2025

@lorban this is passing tests now.... so let's go for this one?
@sbordet @lorban Review please!!

Comment on lines +1566 to +1568
Throwable unconsumed = stream.consumeAvailable();
if (failure != null)
ExceptionUtil.addSuppressedIfNotAssociated(failure, unconsumed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ExceptionUtil.combine(failure, stream.consumeAvailable()); as the line above?
But then, this else block is identical to the else-if above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is different. If failure==null, then it remains null. i.e. it is not a failure to not consume. I'll comment....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I will try without the whole branch....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch is needed

private class HandlerInvoker implements Invocable.Task, Callback
// HandlerInvoker is used as the Response's _writeCallback when ChannelCallback is succeeded and the last send still
// needs to be done, i.e.: _streamSendState set to LAST_SENDING by lockedLastStreamSend().
private class HandlerInvoker implements Task, Callback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should split the functionality of this class.
Leave run() in HandlerInvoker, but move the Callback functionality into a LastStreamSendCallback class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really not a 12.0 thing then

Comment on lines +1623 to +1624
failedCallback = response._writeCallback;
response._writeCallback = httpChannelState._handlerInvoker;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way, we may wait forever for the write to complete and invoke the _handlerInvoker.

How about this:

Suggested change
failedCallback = response._writeCallback;
response._writeCallback = httpChannelState._handlerInvoker;
Runnable task = response.lockedFailWrite(failure);
failedCallback = Callback.from(task, httpChannelState._handlerInvoker);

In 12.1.x we will leverage the new cancelSend() feature automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that continuing on with a pending write pointing at the HttpChannelState is a good idea. Feels like an invitation for a race. What is wrong waiting for the write to complete. It won't be forever unless they have disabled idle timeout.

{
assert _callbackCompleted;
_streamSendState = StreamSendState.LAST_COMPLETE;
completeStream = _handling == null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
completeStream = _handled;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but we also need to complete a stream in cases where we have not started handling yet.

@gregw gregw requested a review from sbordet October 24, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Jetty 12.0: ManagedSession issues due to recursion and/pr multiple completions of the stream.

3 participants